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Abstract — Distributed Nearest Neighbor Search (DNNS) lo- 
cates service nodes that have shortest interactive delay towards 
requesting hosts. DNNS provides an important service for large- 
scale latency sensitive networked applications, such as VoIP, 
online network games, or interactive network services on the 
cloud. Existing work assumes the delay to be symmetric, which 
does not generalize to applications that are sensitive to one-way 
delays, such as the multimedia video delivery from the servers 
to the hosts. We propose a relaxed inframetric model for the 
network delay space that does not assume the triangle inequality 
and delay symmetry to hold. We prove that the DNNS requests 
can be completed efficiently if the delay space exhibits modest 
inframetric dimensions, which we can observe empirically. Fi- 
nally, we propose a DNNS method named HybridNN {Hybrid 
Nearest Neighbor search) based on the inframetric model for 
fast and accurate DNNS. For DNNS requests, HybridNN chooses 
closest neighbors accurately via the inframetric modelling, and 
scalably by combining delay predictions with direct probes to a 
pruned set of neighbors. Simulation results show that HybridNN 
locates nearly optimally the nearest neighbor. Experiments on 
PlanetLab show that HybridNN can provide accurate nearest 
neighbors that are close to optimal with modest query overhead 
and maintenance traffic. 



I. Introduction 

Latency-sensitive applications, such as P2P based VoIP and 
IPTV HI, interactive network services on the cloud (e.g., 
Office Live Workspace [2], Google Maps 0), online network 
games, need to transmit data from geo-distributed servers 
(called a service node) in real-time to many hosts quickly. High 
transmission delays reduce the Quality of Experience (QoE) 
of users Q, which lead to significant business losses |5I|. For 
instance, Google reports that its revenue decreases by 20% 
when the latency of showing search results increases by 500 
ms; similarly, Amazon claims that its sales amount decreases 
by 1% if the page-response latency increases by 100 ms Q. 

Since there are hundreds or thousands of service nodes that 
provide identical services to hosts, there is an increasing push 
for service providers to route real-time data to a host from geo- 
distributed servers that are nearest to that host. For example, 
Google routes users' search queries to geographical-nearby 
servers |6); Akamai redirects hosts' content requests to replica 
servers mainly based on proximity conditions [7|; CoralCDN 
ID uses OASIS [9! and DONAR [10] to select proxy servers 
near to end hosts based on geographic distances. However, 
selecting nearest servers to hosts are still far from standard 
due to several challenges. 
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Fig. 1. Illustrating the RTT and OWDs. Suppose B and C are two servers 
that are able to supply short videos to host A. If we use the RTT metric to 
minimize the delay of video delivery, we may arbitrarily choose any of them 
to send videos to host A based on the RTT metric, since the RTT between A, 
B and that between A, C are all 300 ms. However, since the video files are 
transmitted from servers to hosts, the OWDs from servers to hosts become 
more important 1161 . We can see that the OWD from server C to host A 
is four times less than that from server B to host A. Therefore, choosing 
server C to serve host A significantly minimizes the content transmission 
delay for host A, which is feasible only when we use the OWD metric for 
delay optimizations. 



First, selecting nearest servers must prove to be reliable, 
since service providers need to ensure the QoE fairly for all 
hosts. Selecting nearest servers using proximity coordinates 
iflTl . lfl2l or geographic distances [9] suffer from the mismatch 
between the estimated delays and real-world delays [6], which 
makes the selection accuracy hard to be predicted. On the 
other hand, selecting nearest servers using distributed search 
such as Meridian [ 1 3 1 or OASIS J9] avoid such mismatch 
problems using direct probes, but may terminate at service 
nodes that are much worse than the nearest ones, since the 
search is easily trapped into local minima due to the clustering 
lfl4l and Triangle Inequality Violations (TIV) [15] properties 
of the delay space. 

Second, selecting nearest servers must be aware of uni- 
directional delays whenever possible. Since routing on the 
Internet is asymmetric [16], the delays from servers to hosts 
may deviate those in the reverse direction in several times. 
Furthermore, One- Way Delay (OWD) measurements become 
increasingly practical due to the advance of measurement 
techniques such as OWAMP flTl or Reverse Traceroute [18|. 
However, delay optimizations using Round Trip Time (RTT) 
ignores such delay asymmetry. For multimedia streaming, 
application-level multicast, or more generalized applications 
where data flows in one directions, such agnostics of unidi- 
rectional delays degrades the effectiveness of selected servers, 
as shown in Fig Q] 

Third, selecting nearest servers must find good tradeoff 
between the response time and timeliness. The response time 
lasts several seconds for server selections using on-demand 
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probing such as Meridian |[T3l or OASIS [9|. However, the 
response time degrades the QoE of users in latency-sensitive 
applications, such as online workspace, online music. OASIS 
caches nearest servers for each IP prefix using in-advance 
probes once a week, which has better response time. However, 
the cached server selections tend be suboptimal, since the 
delays vary due to routing dynamics or server workloads 1 19 1, 
and service nodes may be added or removed dynamically. 
Therefore, it is difficult to find good tradeoff between response 
time and the timeliness of server selections. 

The goal of this paper is to provide new algorithms to ad- 
dress the first two challenges. To this end, we develop a general 
enough delay model that captures the major statistics of the 
delay space, including: TIV, delay dynamics and asymmetry 
of delays. This papers makes three contributions. 

First, we analytically demonstrate that we can find approx- 
imately nearest servers quickly by iteratively searching closer 
nodes to the host using sampled nodes from proximity regions 
of each node. However, the analytical method requires a large 
number of samples, which does not scale well. 

Second, we introduce a novel distributed algorithm, named 
HybridNN, that finds nearest service nodes for any machine 
on the Internet (called a target). This algorithm derives from 
our analytical method, which preserves the accuracy and 
speediness of the analytical method. However, HybridNN has 
better dynamic adaptation and reduced measurement costs. 

(i) Dynamic adaptation. A practical DNNS algorithm needs 
to proactively maintain moderate service nodes as samples 
for DNNS queries, irrespective of the system dynamics. Hy- 
bridNN dynamically maintains such neighbors using a con- 
centric ring used in Meridian lfl3ll or OASIS |9|. However, 
HybridNN has two improvements: 

« The maximum number of nodes stored per ring is de- 
rived from the lower bounds of required samples in the 
analytical method, which implies that HybridNN requires 
the lowest possible number of samples that has the same 
accuracy guarantee as the analytical method. 

> HybridNN proposes a biased sampling based concentric 
ring maintenance scheme, in order to sample enough 
nodes for each ring. Specifically, different from previous 
neighbor discoveries based on a gossip protocol, we also 
periodically discover a small number of nearest nodes 
and farthest nodes to each node as neighbors in the 
concentric ring. This is because given a concentric ring, 
the innermost and outermost rings contain only a few 
neighbors compared to other rings, which are hardly 
to be sampled using a gossip based neighbor discovery 
protocol. 

(ii) Reducing measurement costs. HybridNN adopts scalable 
delay predictions to reduce the measurement costs. 

• HybridNN maintains the concentric rings using estimated 
pairwise delays with the revision 11201 of the Vivaldi 
network coordinate |21|, which significantly reduces the 
maintenance overhead of HybridNN compared to Merid- 
ian. 

« HybridNN selects candidate neighbors that are close to 
the target using delay predictions. Since delay predictions 



are only approximations of real-world delays, HybridNN 
also uses a small number of delay probes to avoid being 
misled by inaccurate delay predictions. Interestingly, al- 
though the network coordinate distances are symmetric, 
we empirically find that our hybrid delay measurement 
approach provides the accurate nearest next-hop neighbor 
for both symmetric and asymmetric delay data sets. This 
is because we replace inaccurate coordinate distances 
with direct probes using the error indicator of Vivaldi 
coordinate, which relieves the mismatch between sym- 
metric coordinate distances and asymmetric delays. 
Third, we validate our algorithm using real-world delay data 
sets and PlanetLab deployments. Through simulation study, 
we show that HybridNN finds servers close to optimal for 
symmetric and asymmetric delay data sets. In fact, in more 
than 95% of cases, HybridNN locates the ground-truth nearest 
servers for the targets. Furthermore, most queries terminate 
within four search hops, which implies that HybridNN can 
return the search results fast. Using PlanetLab deployments, 
we confirm that HybridNN can locate accurate nearest servers 
with low query loads and control overhead, with moderate 
query time that improves Meridian in more than 15% of cases. 

II. System Model 
A. Problem Definition 

In this section, we formally define the nearest server location 
problem. Let V denote a set of service nodes and hosts. Let a 
distance function d denote the pairwise delays between node 
pairs in V. Let N be the number of service nodes. 

Our objective is to minimize the serving delays of latency- 
sensitive applications by finding a service node for a requesting 
host with the minimum delay. As discussed in the previous 
section, we expect a generalized delay optimization scenario 
where the delay may be symmetric or asymmetric according 
to the problem context and measurement tools. Furthermore, 
the service nodes may be added or removed, which causes 
system churns. As a result, we need to locate the service node 
that is closest to the target from dynamic service nodes. 

We study a distributed approach to realize our objective, 
since the centralized approach has several well-known weak- 
nesses, including: it requires global delay measurements that 
is hard to obtain for dynamic service nodes; it incurs the single 
point of failures. On the other hand, the distributed approach 
avoids such weaknesses through collaborations of service 
nodes. Specifically, we formulate the Distributed Nearest 
Neighbor Search (DNNS) as: 

Definition II.l. (Distributed Nearest Neighbor Search): For 
a set of dynamic service nodes, given any target T on the 
Internet, the objective of the Distributed Nearest Neighbor 
Search is to find one service node that has the smallest delay 
to T, based on the distributed collaboration of service nodes. 

The definition of DNNS is not novel, since existing research 
on closest server discovery l22l. 11231 fl2l. lfl3l. l9l. iflOl has 
formulated the similar problem. Intuitively, DNNS consists of 
multiple steps. At each step, a current service node P tries to 
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Fig. 2. A DNNS query service substrate for network services. 

locate a new service node that is closer to the target T than 
node P. The flowchart of a sample DNNS query is shown 
in Fig |2] When a host T accesses a networked service, the 
local service client module creates a DNNS query to locate the 
nearest service machine to the client T. The query message 
is firstly forwarded to the bootstrap machine of the DNNS 
service (Step 1). Then our DNNS query system will forward 
the query message recursively until locating a nearest service 
machine (Step 2 — > 3). Finally, our system returns the contact 
addresses of the found service nodes to host T (Step 4). 

B. Key DNNS Requirements 

To be useful for latency-sensitive applications, we identify 
key goals for the DNNS: 

• Accurate, we need to find a service node with the 
lowest interactive time in order to increase the Quality 
of Experience of users. 

• Fast, we need to obtain the nearest service node with 
low query periods. Otherwise, long query time makes the 
DNNS less attractive for server redirections in latency- 
sensitive applications. 

• Scalable, the DNNS process should incur low bandwidth 
costs with increasing system size. 

• Resilient to churns, the DNNS process should find 
accurate results when the service nodes crash or new 
service nodes are added. 

C. Discussion 

Since the DNNS process may last several seconds due to 
on-demand probing, performing DNNS for each query from 
hosts may even hurt the Quality of Experience of users, which 
is significant for small Web objects. For example, Google 
typically returns responses in less than 0.4 seconds; however, 
such low response periods are difficult to be realized when 
applying the DNNS process before returning the responses. 

Therefore, in order to realize a practical nearest server 
redirection service, we need to proactively run DNNS for 
each host and redirect hosts' requests using cached DNNS 
results, in order to achieve millisecond-level response time. 
For example, OASIS J9) shows that it is feasible to cache 
DNNS queries of IP prefixes for server redirections without 
reducing the DNNS accuracy. 



We do not study how to organize cache results in this paper; 
instead, we assume that a DNNS caching service exists to map 
hosts' requests to nearest servers using cached DNNS queries. 
Our focus is to realize an accurate, scalable and resilient 
DNNS system with low DNNS query periods. Since if the 
DNNS query last long periods, then crawling DNNS for every 
IP prefix will be less efficient. 

III. Related Work 

First, for the theoretical computer science field, research 
on the nearest neighbor search mainly focuses on designing 
efficient algorithms in the metric space ll24l . [25], ll26l . ll27l . 
However, applying algorithms in the metric space into DNNS 
is inappropriate, since the delay space violates the triangle 
inequality that is required by the metric space model [20|. 

On the other hand, for the network system field, research 
on nearest neighbor search can be classified into centralized 
and distributed approaches according to the communication 
patterns of the search process. 

A. Centralized Approaches 

The centralized scheme uses a centralized sorting process 
to select nearest neighbors for target nodes. However, the 
centralized approach does not scale well with increasing 
system size, since collecting and transmitting the distance 
measurements easily cause performance bottlenecks, which 
degrades the service availability. 

Guyton et al. ifTTI pioneer the research on finding the closest 
server replica in a centralized manner. They use the Hotz's 
metric [28 1 to represent pairwise hop distances using O(N) 
measurements to landmark nodes, where N denotes the num- 
ber of server replicas. However, smaller hop distances do not 
mean the shorter delays, because one hop may pass continents 
or a data center. Later Carter and Crovella |29|, [ 30 1 combine 
the RTT and available bandwidth measurements to dynami- 
cally select optimal server replica with minimal response time. 
However, the dynamic server selection approach does not scale 
well due to the quadric measurement costs. Netvigator ll3D 
collects RTT values from hosts to landmarks and milestone 
nodes based on the Traceroute measurements, and estimates 
nearest servers based on local clustering. However, Netvigator 
does not guarantee the estimation accuracy, and may get 
obsolete results since Netvigator does not perform active mea- 
surements. Different from Netvigator, CRP ll32l leverage the 
dynamic association of nodes with replica servers from CDNs 
to determine the proximity between end hosts. CRP incurs 
low maintenance costs similar as Netvigator. However, CRP 
does not guarantee the accuracy. iPlane (33), (34 1 constructs a 
synthetic topology structure for the Internet. iPlane estimates 
the nearest servers using the approximated delays on the 
synthetic topology. However, in order to provide services for 
hosts spanning geo-distributed places, iPlane consumes heavy 
bandwidth costs to perform active measurements. 

B. Distributed Approaches 

The DNNS approach iteratively selects closer nodes using 
distributed nearest neighbor search by local measurements 
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towards a small set of neighbors, which reduces the network 
measurement overhead and is more scalable than the central- 
ized approach. Existing DNNS methods fall into four families 
based on their search rules: (i) Bin based DNNS; (ii) Topology 
based DNNS; (iii) Greedy search based DNNS; (iv) Ring 
search based DNNS. 

Bin based DNNS. Ratnasamy et al. Ell assign nodes into 
"bins" based on the ordered sequence of RTT measurements 
to landmarks, and declare nodes are close to each other in the 
same bin. However, the bin approach does not guarantee the 
accuracy, and fails when the landmarks crashes. 

Topology based DNNS. Tiers |35| locates the nearest nodes 
by a top-down approach with a hierarchical clustering tree, but 
may cause load imbalance for nodes near the root of the tree. 
Besides, Tiers do not guarantee the search accuracy since the 
tree does not strictly preserve the pairwise proximity. 

Greedy search based DNNS. Mithos J23) iteratively lo- 
cates proximate neighbors with O(N) hops by a gradient de- 
scent based protocol in the overlay construction, but terminates 
earlier before locating the real nearest nodes due to the limited 
diversity in the neighbor set. PIC [12| iteratively locates 
nearest neighbors at each search step in terms of the coordinate 
distance. However, PIC is prone to be trapped into the local 
minima since the coordinate distance only approximates the 
delays. DONAR ifTOl redirects host requests to optimal server 
relicas by considering the network proximity, the routing opti- 
mization and server loads. DONAR uses geographic distances 
as the proximity metric in order to reduce measurement costs. 
However, DONAR may find suboptimal server replicas for 
delay minimizations since the delay values are not consistent 
with the geographic distances. 

Ring search based DNNS. Our work is closely related 
to Meridian fOl . which seeks approximately nearest nodes 
in log (N) steps. Meridian IT3l maintains a loosely con- 
nected overlay using a gossip based peer finding scheme. The 
neighbors are organized in concentric rings with exponentially 
increasing radii. For a DNNS request, Meridian iteratively 
locates one next-hop node that is j$ (j3 < 1) times closer 
to the target T than the current Meridian node. Compared to 
other families of DNNS, Meridian is more accurate by using 
rings of neighbors that promote the diversity of neighbor sets 
iTPJl . However, several studies have identified that Meridian 
may fail to find the closest service node due to the last-hop 
clustering of servers ifPfl . and TIV of the network delay space 
[20|. Similar as Meridian, OASIS |9| organize neighbors as 
concentric rings for each service node, and iteratively search 
nearest service node for the request host in terms of the 
geographic distances. OASIS reduces the delay measurement 
costs in Meridian through the static geographic coordinates, 
and has low response time using in-advance probes. However, 
OASIS does not guarantee the accuracy of the search results, 
since selecting the geographically closest servers may incur 
high delays (6). 

To address these problems, two adjustments are proposed: 
(i) explicitly finding the clustering subsets based on the struc- 
ture of IP addresses [ 14] or, (ii) adding additional neighbors 
for DNNS that may not be chosen due to the TIVs ll20l . 
However, finding the clusters of nodes sharing identical last 



hops becomes insufficient when the service nodes spread over 
nearby subnets, which may still mislead the DNNS queries 
due to no forwarding nodes closer enough to the target. 
Furthermore, finding all neighbors that are affected by the 
TIVs is challenging since calculating the TIVs for decentral- 
ized service nodes is very difficult; besides, adding additional 
neighbors for DNNS also increases the query overhead. Due 
to the limitations of modifications for Meridian, significant 
challenges remain in DNNS. We focus on tackling these 
challenges in this paper. 

IV. Data Sets 

Our empirical data sets include four publicly available 
real-world RTT data sets, covering the delay measurements 
between wide-area DNS servers and those between end hosts 
[|36l . (i) DNS3997. A RTT matrix collected between 3997 
DNS servers by Zhang et al. [37| using the King method |38|. 
The matrix is symmetric in that d^ = dji, for any pair of 
items i and j, where d denotes the delay matrix, (ii) Host479. 
A RTT delay matrix based on RTT measurements that last 15- 
day periods between the Vuze BitTorrent clients 13911 . Host479 
is asymmetric, where in over 40% of the cases delay pairs dAB 
and dsA in Host479 differ more than 4 times. This is because 
RTT measurements between node pairs are not synchronized 
and delay results are affected by varying queueing delays 
at end hosts [39]. (iii) DNS1143. A RTT matrix between 
1 143 DNS servers collected by the MIT P2PSim project |40| 
using the King method [38 1. The matrix is symmetric in that 
dij — dji, for any pair of items i and j, where d denotes 
the delay matrix, (iv) DNS2500. A RTT matrix between 2500 
DNS servers by the Meridian project lfl3l using the King 
method. The matrix is also symmetric. 

Since obtaining the one-way delays between large-scale 
nodes is extremely difficult, we use Host479 as an asymmetric 
delay data set. However, we do not claim that our experiments 
on Host479 are the same as those on the one-way delay metric. 

V. A Generalized Delay Model for the Delay 
Space 

In this section, we present a simple and general enough 
delay model for the delay space. Our model captures the im- 
portant characteristics of the delays, including TIV, dynamics 
and asymmetry of RTTs and OWDs. In the next section, we 
will analyze the DNNS problem on our model. 

Assuming that we select a node P in V as the center of 
a ball, and choose a positive real number r as the radius of 
the ball, then we call a closed ball Bp(r) as the set of nodes 
whose delays to node P are not larger than r, i.e., Bp(r) = 
{v\d(P, v) < r, P, v G V}. Furthermore, the volume of a ball 
is the number of nodes covered by the ball. Besides, we define 
the cover relation of different set of nodes as follows: 

Definition V.l (Cover). Let S and be two sets of nodes, if 
fl C S, then the set S is said to cover the set Q. 

A. Definition 

We first state the requirements for a delay model suitable 
for RTTs and OWDs used for delay minimizations, (i) The 
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delay model should relax the symmetry requirements, since 
the OWDs are asymmetric due to routing asymmetry |41|. 
Besides, although RTT is symmetric by only accounting for 
the delays on the routing paths, real-world RTT measurements 
may be asymmetric due to variations of queueing delays at 
end hosts or un-synchronized measurements [42 1. (ii) The 
delay model d should allow TIV to exist, since the RTT 
metric exhibits TIV [15 |. (iii) The delay model d should allow 
dynamic delays, since the delay varies from time to time 1 19 1. 

Therefore, inspired by the inframetric model [43] that allows 
the TIVs, we extend the inframetric model to a relaxed infra- 
metric model that relaxes the symmetry requirement, where 
the distance function d satisfies: 

Definition V.2 (Relaxed Inframetric Model). Let a distance 
function d : V x V — > 5i + be a relaxed p-inframetric 
(p > 1), if d satisfies the following conditions for any 
pair of nodes u and v: (1) if d{u,v)=0, then u=v; (2) 
d(u,v) < pmax{d(u,w),d(v,w)}, for any arbitrary node 
w satisfying w £ {u, v}. 

Pros of the Relaxed Inframetric Model: The condition (2) 
in Def IV.2I states a generalized relation of any directed triple 
from V, which has two beneficial properties: 

• TlV-adaptive. Intuitively, smaller p implies that three 
edges are closer to each other; while larger p implies 
that one edge is significantly larger than any of the other 
two edges, which may introduce a TIV. Therefore, similar 
as the inframetric model, the relaxed inframetric model 
naturally allows the occurrence of TIVs. 

• Dynamics-adaptive. The inframetric model allows the 
delay variations by varying the inframetric parameter p to 
describe the relations of updated triples. Therefore, both 
inframetric model and the relaxed inframetric model are 
able to model variations of triples due to delay variations. 

• Asymmetry-aware. The relaxed inframetric model al- 
lows the asymmetry in the delay space, which generalizes 
to RTTs and OWDs. As a result, we are able to analyze 
DNNS on symmetric and asymmetric delays through the 
relaxed inframetric model. 

Having shown the advantages of the relaxed inframetric 
model, next we discuss the statistical property of the infra- 
metric parameter p. 

First, the seminal work states that if the delay space obeys 
the triangle inequality, then p must be smaller or equal than 2 
[43 1. However, when p is smaller than 2, there may exist TIVs. 
For example, given a triple with pairwise RTTs 3, 1, 1.8, we 
can see that the inframeter parameter p is approximately 1.67 
but there also exists a TIV in the triple. Therefore, we can see 
that p < 2 is only a necessary but not a sufficient condition 
for no TIVs. 

Second, we find that the inframetric parameter p is quite low 
for most triples. First, the 95th percentiles of all data sets of p 
are below 2.5. Low inframetric parameter p means the largest 
edges in triples are not too much larger than the other edges of 
the triples. Second, among the triples whose p are bigger than 
2, their p values are around 3 on average. Therefore, selecting 
p=3 is reasonable to model most of the triples. 



B. Dimensions on the Relaxed Inframetric Model 

Having introduced the definition of the relaxed inframetric 
model, now we analyze the growth dimension of the relaxed 
inframetric model, which is the ratio of the number of nodes 
covered by two closed balls with the identical center and 
varying radii 1431, ll44l. 

The growth dimension is important for efficient DNNS. As 
shown by Karger and Ruhl B4ll . assuming that the growth 
dimension is low, each node P can uniformly sample a modest 
number of nodes to locate a node that is closer to any other 
node in V. Therefore, we can recursively find nodes closer to 
the target based on the above sampling procedure, which helps 
the design of the DNNS algorithms. However, since Karger 
and Ruhl assumes the triangle inequality to hold [ 44 1 , we 
cannot immediately apply their DNNS results into the relaxed 
inframetric model. Accordingly, we need new proof techniques 
for DNNS analysis. 

The growth dimension for the inframetric space ll43l is 
defined as follows: 

Definition V.3 (Growth H43V ). For a p-inframetric model, for 
any r 6 5R + and P G V, if \Bp (pr)\ < j g \Bp (r)\, where 
7 g <G 5R + , the p-inframetric model is said to have a growth 

The growth dimension j g on the inframetric model general- 
izes the growth definition in the metric space which assumes 
the triangle inequality to hold fl4l . 11371 . Therefore, the growth 
7 g inherits the intuitive meanings of the growth definition 
in the metric space. Specifically, low growth j g means that 
the number of nodes covered by the closed ball Bp(pr) is 
comparable to the number of nodes covered by the closed 
ball Bp(r). Therefore, when we expand a ball around a node 
P € V, we can see that new nodes in V "come into view" at 
a constant rate [44]. 

Finally, based on Def IV.31 the infimum of the growth 
dimension 7 g equals the ratio of the volume between Bp (pr) 
and Bp(r) for any node P and radius r. Since we are 
interested in the infimum, when we refer to the growth of 
the inframetric space, we mean the infimum accordingly. 

Next, we empirically evaluate the growth dimension of the 
delay space with respect to the radius r and the inframetric 
parameter p. Our evaluation complements the seminal work 
on the growth in the inframetric model [43 1 using symmetric 
and asymmetric data sets. Recall that computing the growth 
is trivial by comparing the volumes of the balls with identical 
centers and varying radii. 

Fig [3] shows the median and 90th percentile growth values 
for varying radii. The median growth of most data sets is 
relatively small, and declines quickly with increasing radii 
for most data sets except for Host479. For Host479, the 
median growth may increase as the radii increase. On the other 
hand, the 90th percentile growth shows divergent dynamics for 
different data sets, revealing "M"-shape dynamics, indicating 
that a small fraction of growth values may increase or decrease 
with increasing radii. 

Furthermore, by selecting different percentages of nodes 
for the statistics, Fig [3] shows that the median growth is less 
sensitive to the sample size compared to the magnitudes of 
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Fig. 3. The statistics of the median and 90-th percentile growth 7 9 for 
p = 3; -0~ denotes median values computed from sampled 20% nodes; 
-x- denotes median values computed from sampled 50% nodes; -o- denotes 
median values computed from sampled 75% nodes; - represents median values 
computed from all nodes; 

■ ■ ■ ■ denotes 90-percentile values computed from sampled 20% nodes; 
— X' • ■ denotes 90-percentile values computed from sampled 50% nodes; 
- . o- . denotes 90-percentile values computed from sampled 75% nodes; - . - 
represents 90-percentile values computed from all nodes. 



radii; while the 90th percentile growth becomes relatively 
more sensitive to the sample size. 

In summary, the growth metric 7 S of the delay space is 
quite low. Furthermore, with increasing radius, the growth -f g 
decreases to 2 quickly on average. However, sometimes the 
growth values increase for increasing radius, which means that 
there are many nodes that have similar distances to each other. 
This usually corresponds to cases where the center of the ball 
is a node on the edge of a cluster, where nodes in the same 
cluster have smaller distances compared to those to other nodes 
not in the same cluster. 



VI. Efficient DNNS on the Relaxed Inframetric 
Model 

In this section, using the relaxed inframetric model pre- 
sented in Sec [V] we analyze how to design an efficient DNNS 
using localized operations suitable for distributed systems. 
Proofs are omitted due to space limits, which can be found in 
the full report ll36l . 

Our major result is that it is feasible to design an accurate 
and fast DNNS algorithm for the relaxed inframetric mode, 
at the expense of sampling enough candidate servers from the 
proximity region of each node. We construct a simple DNNS 
process satisfying our major result. However, the simple 
DNNS process incurs relatively high measurement costs due 
to the sampling conditions, which will be improved in the next 
section. 




Fig. 4. Sampling closer nodes to a target T from Bp (pr) in the p-inframetric 
model with growth j g . 



A. Sampling Conditions to Locate Closer Nodes To Targets 

In this section, We analyze samples required to locate a node 
closer to a target than the current node based on the growth 
dimension in Sec IV-BI The sampling conditions serves as the 
basis for the efficient DNNS algorithmic design. 

Our results show that we can sample a server closer to the 
target using bounded samples at each node. In order to obtain 
a node that is (3 (/? 6 (0, 1]) times closer to the target than the 
current node, we need to uniformly sample enough neighbors 
from the proximity region of each current node. 

Without loss of generality, assume that a node P needs to 
locate a node Q that is j3 (/3 < 1) times closer to a target T, 
which implies that g?qt < ft x dpr- Let dpr = r. We can 
see that node Q must be covered by the ball Bp (pr), since 
dpQ < p max {dpT, ^qt} = pr. Fig [4] shows an example 
of sampling a node closer to the target T in the closed ball 
Bp (pr) in the growth dimension. 

We first quantify the volume differences of balls with 
identical centers but different radii. 

Lemma VI.l. Given a p-inframetric with growth "f g > 1, for 
any x > p, r > and any node P, the volume of a ball 
Bp(r) is at most x a smaller than that of the ball Bp(xr), 
where log p 7 9 < a < 21og p 7 s . 

Lemma I VI. 1 1 states that the volume differences of the balls 
with identical centers and different radii are bounded by x a , 
where x is the multiplicative ratio between different radii, and 
the parameter a lies in a bounded interval. 

We calculate a by varying the radius r and the multiplicative 
ratio x as shown in Fig [5] We can see that a is mostly below 
1, and decreases close to quickly with increasing radius r 
or multiplicative ratio x. Therefore, the volume difference x a 
scales sub-linearly in most cases. On the other hand, for small 
radius r or low multiplicative ratio x, the volume difference 
x a may scale ultra-linearly. 

Furthermore, we also characterize the inclusion relation of 
balls with different centers, which generalizes the inclusions of 
balls around a node pair in the metric space [44 1 . Lemma fVI. 21 
lays the foundation for uniform sampling nodes to perform 
DNNS on the inframetric model. 

Lemma VI.2. (Sandwich lemma) For any pair of node p and 
q, and d pq < r, then 



B q (r) C B p (pr) C B q (p 2 r) 
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Fig. 5. Median a as function of the radius r and the multiplicative ratio x. 



Using Lemma I VI. 1 1 and IVI.2I we can quantify the size of 
sampled neighbors, to assure that at least one neighbor lies in 
the closed ball B T {fir). 

Theorem VI.3. (Sampling efficiency in the growth dimension) 
For a p-inframetric model with growth 7 S > 1, for a service 

node P, and a DNNS target T satisfying dpT < T, when 

( 2 \ a 

selecting 3(^-1 nodes uniformly at random from Bp(pr) 
with replacement, with probability of at least 95%, one of 
these nodes will lie in Bt (/3r), where log p 7 5 < a < 21og p 7 g 
and p < 1. 

Since a and p are determined by the delay space, we can 
see that the number of samples decreases with increasing delay 
reduction threshold f3. As f3 approaches 1, the number of 

« 3 P 2a e 



required samples becomes approximately 3^^- 
[37^,37^] based on Lemma [VI. II 



B. DNNS on the Inframetric Model 

In this section, we present the analysis of DNNS on the 
Inframetric model. We will show the search accuracy, search 
periods and search costs related to a DNNS process. We prove 
that, by recursively following such sampling conditions, we 
can locate a server that is l//3-approximation to the optimal: 
the delay from the found server to the target is not bigger than 
1//3 times that from the nearest server to the target. 

First, we review the goal of each DNNS step using the 
sampling conditions in Sec I VI- Al Assume that a node P wants 
to locate a node that is j3 times closer to a target T. The goal 
of the current DNNS step is to locate a node /3 times closer to 
the target than the current node P. To that end, Theorem IVI.3I 
shows that we need to sample up to 3^^-^ nodes uniformly 
at random from Bp (pr) with replacement. 

Based on the sampling condition in Theorem |VI.3l perform- 
ing DNNS in the growth dimension can be formulated into a 



simple DNNS procedure in Definition IVI.4I 

Definition VI.4 (A simple DNNS method in the inframetric 
model), sampling 3(^j neighbors from the closed ball 
Bp (pdpr) at each intermediate node P, forwarding the 
DNNS request to a next-hop node (3 times closer to the target 
than the node P, and stopping at a local minima when we 
can not find such a next-hop node. 

Furthermore, we can quantify the efficiency of found neigh- 
bors based on the above DNNS procedure by Corollary IVI.6I 
As a result, we can locate an approximately optimal nearest 
neighbor for a target T when /? approaches one. Furthermore, 
the number of required search steps is a logarithm function of 
the ratio A of the maximum delay to the minimum delay in the 
delay space, indicating that the DNNS queries can complete 
quickly. 

Definition VI.5 (^-approximation). For a DNNS request with 
target T, a found nearest neighbor A is a u- approximation, 
if the delay between A to T is smaller than ud*, where d* is 
the delay between the real nearest neighbor to T. 

Corollary VI.6. For a relaxed inframetric model with growth 
7 9 , according to the DNNS process in Definition \VI.4\ the 
found nearest neighbor is a 4 -approximation, and the number 
of search steps is smaller than log.i A, where A is the ratio 
of the maximum delay to the minimum delay of all pairwise 
delays. 



C. Limitations of Theoretical Results 

To find a better next-hop neighbor without missing any 
closer nodes, based on the DNNS analysis in the inframetric 
model in Sec IVI-B1 we should sample approximately 3(^-1 
nodes whose delays to current node P are not larger than 
pdpT- However, the number of the candidate neighbors may 
be quite high, as shown in Fig [6] We can see that the number 
of required samples exceeds 100 accordingly, for /3 below 0.4 
or a above 1. Such high number of samples implies that we 
need extremely large number of samples for continuing the 
DNNS query. 

On the other hand, the number of samples decreases with 
decreasing a or with increasing j3. When a is below 1, the 
number of samples is below 33 if the delay reduction threshold 
B is above 0.8. As a result, we can see that we need to choose 
a large f3 in order to reduce the number of samples, since the 
median values of a are mostly no more than 1 from Fig [5] 



D. Comparison with Previous Inframetric Study 

Our relaxed inframetric model is inspired by the seminal 
study on the inframetric model ll43l that assumes the symmetry 
of the distance function. We extend the inframetric model 
study for the Internet delays in four aspects: 

• We extend the inframetric model to allow both symmetric 
and asymmetric distance functions, which generalizes the 
RTTs and OWDs that are important for latency-sensitive 
applications. 
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Fig. 6. The number of sampled neighbors 3 1 ^- J by varying the volume 

difference parameter a from the interval [0, 2} based on the analysis in Sec 
IVI-AI and the delay reduction threshold ft. We set the inframetric parameter 
p to be 3 to represent most triples. 

> We clearly show the relation between inframetric param- 
eter p and the TIV. The inframetric parameter p < 2 is a 
necessary but not sufficient condition for no TIVs. 

> We formulate the DNNS problem on the relaxed infra- 
metric model and propose a simple DNNS method that 
finds approximately nearest neighbor for any target using 
at most logarithmic search hops. Interestingly, our simple 
DNNS method works on both symmetric and asymmetric 
delay metrics. 

VII. Realizing a Practical DNNS 
A. Overcoming Limitations of the Simple DNNS Method 

Recall that the measurement costs limits the usefulness 
of the simple DNNS method defined in Def IVI.4I from Sec 
I VII- A I Besides, in the distributed system context, since each 
service node does not have the global view of the delay space, 
sampling enough neighbors from the closed ball centered at 
each service node is difficult. We discuss design principles to 
tackle these two difficulties in this section. 

1) Reduce Measurement Costs: We reduce the measure- 
ment costs in two complementary approaches: (i) Given that 
the number of required samples of the simple DNNS method 
depend on varying parameters, we seek to modify the pa- 
rameters to obtain the lower bound of the required number 
of samples, (ii) Given that network coordinates can be used 
for delay estimations, we avoid complete measurements from 
selected samples to the target using delay estimations. 

First, recall that the number of samples for the simple 
DNNS method increases quickly with decreasing delay reduc- 
tion threshold f3. Therefore, to reduce the number of samples, 
we should set the delay reduction threshold (3 to be close to 1 . 
On the other hand, since the approximation ratio of the simple 
DNNS method is we can see that large f3 also leads to 
better approximations of nearest neighbors. As a result, we set 
(3 to 1 in order to reduce the number of samples and obtain 
the best approximation accuracy. 

Second, although we reduce the number of samples using 
modified (3, we still need delay measurements between se- 
lected samples to the targets, which consume the bandwidth 
costs and CPU loads of service nodes. Therefore, we hope 
to reduce the required delay measurements while obtaining 



the sample that is closest to the target. To that end, we use 
delay estimations based on network coordinates to reduce 
the delay measurement costs. However, since the delay es- 
timations incur errors due to the embedding distortions of 
network coordinates, simply using delay estimations to find 
the nearest neighbors becomes less reliable. Instead, we issue 
delay measurements when the delay estimations are inaccurate, 
so as to avoid the inaccurate delay estimations. 

2) Sample Enough Neighbors For Continuing DNNS 
Query: Based on the simple DNNS method, each DNNS 
service has to maintain enough neighbors covering different 
delay ranges in the delay space, in order to find the nearest 
neighbor to any target. Therefore, each node has to maximize 
its diversity in the neighbor set. 

Gossip based neighbor management is frequently used for 
existing DNNS methods. For example, Meridian [13] and 
OASIS d9) use an anti-entropy gossip protocol to discover 
neighbors, and store neighbors using rings of neighbors called 
concentric rings. However, during our experiments, the inner- 
most and outermost rings in the concentric ring often find no 
or only few neighbors compared to the capacity of the ring, 
while the rest of rings with radii lying in the middle portion 
of the delay distributions are filled with too many neighbors, 
leading to frequent ring management events, incurring heavy 
computation and communication overhead. 

We explain the insufficiency of the gossip process in details. 
Assuming that we know the complete delay matrix, for each 
node, we compute the percent of mapped nodes for each ring, 
which serves as an upper bound of sampled neighbors for 
that ring. Then we can analyze whether the distributions of 
mapped nodes in concentric rings affect the gossip process. 
As shown in Fig [7] we can see that most nodes are mapped 
into a few number of rings, whose delay ranges lie in the 
middle portion of the delay distributions. However, only quite 
a few nodes are mapped into the innermost and outermost 
rings, which result in a skewed distribution of mapped nodes 
for the concentric rings. As a result, since the gossip process 
adopts the uniform sampling approach, the gossip process will 
inevitably sample insufficient neighbors from those rings that 
have too few mapped nodes. 

Accordingly, to improve the concentric ring maintenance, 
we need to sample enough neighbors that lie in different delay 
ranges. To that end, we propose to find nearest neighbors and 
farthest neighbors for each service node, in order to fill the 
innermost and outermost rings in the concentric ring. 

B. Our Design 

Based on the design principles in Sec IVII-AI we design 
a novel DNNS method named HybridNN (Hybrid Nearest 
Neighbor Search). We present an overview of HybridNN. 
To sample enough candidate neighbors from the proximity 
region of the current node, each node must first maintain 
a neighbor set that contains enough neighbors within each 
proximity region. Then using the neighbor set, we select can- 
didate neighbors using the sampling conditions of the simple 
DNNS method, in order to cover the neighbors closer to the 
target with high probability. Next, we determine the candidate 
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Fig. 7. The percent of mapped nodes into different rings, assuming that 
we obtain the complete delay matrix. The i-th ring contains neighbors whose 
delays to a node P lie in the interval (as 1-1 , os l J , with i > 0, a a constant, 
s a multiplicative increase factor (a = 1, s = 2 ms as configured by Wong et 
al. 1133 ). Besides, since our objective is to determine the distribution of nodes 
mapped into the concentric ring, we do not limit the maximum capacity of 
each ring. 



neighbor closest to the target, using delay estimations and 
direct probes, in order to obtain a better tradeoff between 
sampling bandwidth and accuracy. Finally, using the currently 
nearest candidate neighbor to the target, we determine whether 
to terminate the DNNS query. As shown in Fig [8] HybridNN 
is composed of five components: 

Neighbor Maintenance: This component maintains the neigh- 
bor set for DNNS queries. Since nodes are mapped into the 
rings at the middle portion of the concentric ring, which 
implies that neighbors mapped into the head portion and tail 
portion of the concentric ring are difficult to be sampled using 
the uniform sampling based approach. As a result, we need to 
increase the sampling probability of such neighbors, in order 
to fulfill the sampling conditions for DNNS queries. To that 
end, we over-sampling neighbors in the head portions and 
tail portions of the concentric rings, besides we uniformly 
sampling neighbors located in the middle portions of delays 
and. 

Selecting Candidate Neighbor: This component selects can- 
didate neighbors to satisfy the sampling conditions of the 
simple DNNS method. When a node P receives a DNNS 
query, node P determines its delay towards the target T, then 
selects neighbors from its diversity-optimized neighbor sets 
(Sec lVH-Ci l by covering possible closer neighbors towards the 
target T (Sec IVII-DV Furthermore, we prune those neighbors 
that could mislead the DNNS query into poor local minima. 
Coordinate Maintenance: This component updates the coor- 
dinate of the target in order to estimate delays to targets from 
candidate neighbors, since the target machine may not have 
the coordinate for delay estimation (Sec IVH-Ei i. Additionally, 
each service machine maintains a network coordinate used for 
delay estimations. 

Determining Closest Neighbor: This component determines 



Fig. 8. The flow chart of four search steps at a service node for a DNNS 
query. 



the neighbor nearest to the target (Sec |VH-Fl ). Each node com- 
putes the candidate neighbor closest to the target using delay 
estimations and direct probes, in order to balance between the 
measurement costs and measurement accuracy. 
Termination Test: This component determines to continue 
or stop a DNNS query (Sec |VH-G1 >. Recall that in previous 
section we set the delay reduction threshold f3 to be 1 on 
order to reduce the number of samples and obtain better 
approximation ratios to the optimal results. Therefore, Hy- 
bridNN conservatively terminate the DNNS query only when 
all candidate neighbors having larger delays than the current 
node. 

Finally, HybridNN uses an extensible delay measurement 
interface. For instance, by default HybridNN simply use 
the system-built-in Ping command to obtain pairwise RTT 
measurements. When there exist an on-demand OWD probe 
service such as Reverse Traceroute |[T8l , HybridNN configures 
a RPC interface to request the pairwise OWD results. 

C. Neighbor Maintenance 

In order to facilitate the neighbor sampling for DNNS 
forwarding, each service node maintains neighbors that are 
sampled from different regions in the delay space. We intro- 
duce the neighbor discovery and update in this section. 

1 ) Organize Neighbors Into Rings for Proximity Selection: 
Since the proximity region for neighbor sampling in the simple 
DNNS method is a closed ball, we choose the concentric ring 
to organize neighbors for each node. For instance, if we need 
to locate all neighbors that are at most e?2 ms away, we select 
all neighbors from those rings whose ring numbers are at most 

An important parameter for the concentric ring is its ring 
size A, which determines the maximum number of neighbors 
per ring. Since we need to sample enough neighbors using 
the concentric ring to guarantee to locate a neighbor closer to 
the target with a high probability, we analytically determine 
the choice of A as follows. First, the total number of samples 
3f^-J is within the interval [37^,37^], since we set the 
delay reduction threshold (3 to 1. Therefore, if we set the 
number of neighbors A at each ring to be at least 0(7g)> 
we can ensure that with a high probability, we can find a 
neighbor that is closer to the target than the current node 
P. Furthermore, since -f g is low on average from previous 
sections, we can set the number of neighbors A to be a modest 
integer (8 by default). 

Furthermore, to adapt to the dynamics of delays, we use a 
moving median as a latency filter for extracting stable delay 
measurements to each neighbor P31 , which allows to have up- 
to-date delay estimates resilient to the measurement noises. 
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2) Biased Sampling based Neighbor Discovery: Based on 
the distribution of neighbors for each ring in the previous 
section, we have seen that we need to over-sample neighbors 
mapped into the head portion and the tail portion of the 
concentric rings. To that end, we adopt both uniform sampling 
and over-sampling approaches. 

Uniform sampling. We reuse the gossip process in Merid- 
ian. Briefly, each node P periodically starts the gossip process 
by uniformly selecting a neighbor Q from P's concentric 
ring as communication partner, and sends a gossip request 
message to node Q containing randomly sampled neighbors, 
one neighbor per non-empty ring. When Q receives the gossip 
request, Q will send a gossip ACK to P immediately; besides, 
Q iteratively sends gossip requests towards the sampled neigh- 
bors in the gossip request message of P. 

Finally, if we use the RTT metric, then node P inserts Q 
into the corresponding ring according to the round trip delays 
measured as the period between the gossip request and the 
gossip ACK. Alternatively, if node P is able to measure the 
one-way delay from P to Q, then node Q is inserted into the 
corresponding ring according to the one-way delay from P to 
Q. 

Over-sampling. Our goal is to sample enough neighbors 
from those mapped neighbors lying in the head and tail 
portions of the concentric rings. For this purpose, we use K 
closest neighbor search and K farthest neighbor search. The 
returned nodes are directly stored into the concentric ring, 
as the delay values between the current service node to the 
returned nodes are obtained during the K closest neighbor 
search and K farthest neighbor search processes. 

> K closest neighbor search. Each node P periodically 
finds nearby nodes by issuing K closest neighbor search 
with itself as target. Here K is a system parameter. Firstly, 
node P randomly selects a neighbor Q from its concentric 
ring, and sends to Q a K nearby neighbor search mes- 
sage. Then node Q starts a K closest neighbor search 
process. After the K closest neighbor search process is 
completed, found nearby nodes and the corresponding 
delays to P are returned to node P, and P saves these 
returned nearby nodes into its concentric ring. 
• K farthest neighbor search. Similar as the K closest 
neighbor search process, each node P periodically issues 
K farthest neighbor search. Later, the K farthest neighbor 
search results include found distant neighbors and the 
corresponding delay values to node P. P stores the 
returned distant neighbors into its concentric ring by the 
corresponding delay values. 
Due to space limits, the details for K closest neighbor search 
and K farthest neighbor search are omitted here, which can 
be found in the full technical report (36). 

3) Replacing Suboptimal Neighbors Without Probes: In 
order to bound the memory overhead of the concentric ring, 
we need to manage the size of the concentric rings when some 
rings reach their maximum capacity A. To reduce CPU costs 
due to frequent ring managements, we lower the frequency of 
ring managements: we first set up another tolerance threshold 
A t for each ring; then we begin the ring management when 
some rings having at least A + A t neighbors; during the ring 



management, we remove A t neighbors from those rings that 
have at least A + A t neighbors. 

When we need to remove A t neighbors from some rings, 
we follow the removing philosophy of Meridian: preserve 
those that maximize the diversity of neighbors in a ring using 
the maximal hypervolume poly tope algorithm (|13|). This is 
because the higher diversity in the neighbor set translates to 
better chances of locating a nearby nodes for any target. How- 
ever, the maximal hypervolume polytope algorithm requires 
all-pair delay measurements of nodes in a ring, which needs 
O (A 2 ) probes. In order to avoid such measurements, we turn 
to adopt network coordinates for delay predictions. 

For delay predictions, we use the revised Vivaldi algorithm 
[21 1 that is robust to TIVs 11201 . We denote the revised Vivaldi 
[20 1 as TIV-Vivaldi(xi, ej, dij, Xj, ej), where the input Xi, 
Xj denote the coordinate of node i and j, respectively; the 
input 6j, ej denote the averaged error of node i's and j's 
coordinates, respectively. The output of TlV-Vivaldi are the 
updated coordinate Xi and coordinate error e, of node i. 

Each service node passively maintains a coordinate, and 
estimates delays using coordinate distances. Besides, for es- 
timating delays with neighbors in the concentric ring, each 
service node also stores its neighbors' coordinates. 

Since delay varies, each node updates its own and cached 
coordinates periodically. Rather than introduce additional de- 
lay probes, we update coordinates by reusing the delay mea- 
surements to other service nodes during the biased sampling 
procedure. Therefore, we significantly reduce the maintenance 
costs compared to Meridian. First, each node receiving the 
gossip message piggybacks its coordinate to the sender along 
with the acknowledged gossip message. After receiving the 
coordinate from the gossip receiver node, the gossip sender 
node stores the new coordinate of the gossip receiver node, 
and updates its own coordinate by triggering TlV-Vivaldi using 
the delays obtained during the gossiping process. 

D. Select Candidate Neighbors 

Assume that node P receives a DNNS query to the target 
T. Based on the sampling conditions of the simple DNNS 

method, node P needs to select 3(^-1 neighbors whose 
delays to node P are in the delay range [0, pdpx]- Since each 
ring contains 0(^) neighbors, we simply select all neighbors 
of rings numbered in the range [1, [log 2 (pdpr)]] as candidate 
neighbors. 

Furthermore, we also prune several neighbors that mislead 
the DNNS process. First, candidate neighbors that contain 
too few non-empty rings are more likely to provide no hints 
on continuing the DNNS queries, thus the DNNS queries 
can be trapped into local minima, due to the neighbors' 
sparse diversity of the delay space. Therefore, we remove 
all neighbors with fewer than t non-empty rings (r = 4 by 
default). Second, all neighbors that have received the identical 
DNNS query should be removed in order to avoid the search 
loops. Therefore, let the forwarding path of a DNNS query 
be the sequence of nodes forwarding the query, we remove 
any node on the forwarding path. 
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E. Coordinate Maintenance for Targets 

In order to reduce the delay measurement costs, we predict 
delays from service nodes to the target, since each service node 
has computed its network coordinate during the neighborhood 
management process (Sec IVII-C3t . As a result, reusing the 
coordinates for predicting delays can reduce the measurement 
costs. 

Unfortunately, we may not know the coordinate of the 
target, as the target can be any machine on the Internet. 
Therefore, we propose to compute the coordinate for the target 
on-the-fly based on the TlV-Vivaldi. 

First, when node P receives the DNNS query for a target 
T, node P will initialize the network coordinate xt for target 
T if T's coordinate is not stored in the DNNS query message. 
To that end, node P asks a fixed number of neighbors (at 
most 10) to directly probe the target T. Then, node P updates 
target T's coordinate by TlV-Vivaldi using the coordinates 
and delay measurements from these neighbors to target T, 
which updates T's coordinate xt and coordinate error er as 
the output of TlV-Vivaldi. Finally, node P stores target T's 
coordinate into the DNNS query and forwards to the next- 
hop node for recursive search. This completes the coordinate 
initialization for the target T. 

Second, after initializing T's coordinate, each node Q that 
forwards the DNNS query will update target T's coordinate for 
better convergence of target T's coordinate. To that end, each 
node Q applies TlV-Vivaldi to update target T's coordinate xt 
and coordinate error Xt, using node Q's coordinate and delay 
dQT the target T. 

F. Determine Closest Neighbor 

After we assign a network coordinate to the target in 
Sec IVII-EI we can use the network coordinate distances to 
approximate the real-world delay and reduce the measurement 
costs. Nevertheless, since the coordinate distances are only 
approximations, closest neighbors selected according to the 
network coordinates may be inconsistent with the real ones. 

Therefore, we locate closest neighbors to the target T from 
the candidate neighbors found in Sec lVII-Dl by combining the 
delay predictions with a small number of direct probes. 

First, based on the coordinate distances from candidate 
neighbors to target T, we find top-m nearest neighbors S c 
to the target T from the candidate neighbors. 

Second, since coordinate distances may be erroneous, we 
also choose those candidate neighbors S e whose coordinates 
are not reliable. Since each TIV- Vivaldi coordinate xi is ac- 
companied by a coordinate error metric e, [20|, we choose un- 
reliable neighbors whose coordinate errors exceed a threshold. 
We found that setting the threshold to be 0.7 can significantly 
reduce the negative impact due to the coordinate inaccuracies. 

Third, to adapt to coordinate errors caused by TIV, since 
high coordinate distance errors indicate violations of triangle 
inequality [20|, we simply include all candidate neighbors St 
whose coordinate distance and real delay towards the current 
node P differs by more than 50 ms, which has good tradeoff 
between accuracy and bandwidth costs. 



Finally, using the union of selected candidate neighbors 
5* = S C U S e U St, the current node P asks neighbors in 5* to 
probe the delays to target T, from which node P determines 
the closest neighbor. Ties are broken by choosing the neighbor 
with most accurate coordinate. 

G. Termination Test 

Recall from Sec IVII-AI HybridNN set the delay reduction 
threshold (3 to be 1 , in order to reduce the number of selected 
neighbors and obtain better approximation ratios for the found 
nearest neighbors. Therefore, when the closest neighbor se- 
lected from Sec IVII-Fl has a larger delay to the target than that 
of the current node P, node P terminates the DNNS query. 
Then node P sends the currently closest node to the host that 
issues the DNNS query. 

VIII. Extensions to HybridNN 

HybridNN can be readily extended to search more than just 
one nearest node. Here we will just give two examples namely, 
K closest neighbor search and K farthest neighbor search, 
which are both utilized to oversample neighbors in the network 
delay space in order to increase the diversity for neighborhood 
management. 

A. K Distributed Nearest Neighbor Search 

The K Distributed Nearest Neighbor Search (KDN 2 S) aims 
to locate the K nearest neighbors to a target T, where K is 
a system parameter. To store the found nearest neighbors, we 
append a new field M.Q that caches nearest neighbors to the 
DNNS query message M. 

A naive KDN 2 S solution is based on the finding and 
removing approach: first we find one closest neighbor towards 
the target based on the HybridNN algorithm, then we delete 
the found nearest neighbor from the system, and we restart 
the HybridNN algorithm from the same query node until we 
locate K nearest servers to the target. Nevertheless, deleting 
the closest neighbors from the system is not practical for 
a large-scale system due to the broadcasting communication 
overhead, and repeated DNNS processes increase the query 
overhead for the service nodes on the DNNS forwarding paths. 

On the other hand, if we assume that the concentric ring 
of each node does not append new neighbors, the network 
coordinate of each node keeps unchanged and the network 
delays keep stable during the period of a KDN 2 S query, we 
find that there exists temporal correlation in the forwarding 
paths of consecutive DNNS queries starting from the identical 
node in the naive KDN 2 S solution: if we issue a new DNNS 
query from the same starting node immediately after the 
preceding DNNS query, then the forwarding path truncated 
the last-hop node of the new DNNS process is a subpath of the 
forwarding path of the preceding DNNS query, since we can 
see that the intermediate nodes on these two forwarding paths 
are identical in HybridNN. Our assumption generally holds 
after the network coordinates converge and the concentric rings 
contain enough neighbors. Furthermore, the constancy of end 
to end network delays has been confirmed to be on the orders 
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Fig. 9. KDN 2 S. 

of hours by Zhang and Duffield l46l as well as the iPlane 
project 11331, 01. 

Using the temporal correlation of consecutive forwarding 
paths from the same starting node, we propose a backtracking 
based KDN 2 S algorithm, as shown in Algorithm [TJ After we 
find one nearest neighbor and terminate at a service node 
Pi by HybridNN, we resume the KDN 2 S query from Pi, 
by backtracking from Pi to its predecessor node Pi on the 
DNNS forwarding path, and by recursively finding the nearest 
neighbor at Pi, until we locate K nearest neighbors. With 
backtracking, the KDN 2 S resumes the query at service nodes 
that are close to the target, therefore we can quickly locate new 
nearest neighbors with reduced forwarding overhead compared 
to the naive KDN 2 S solution. 

Fig [9] gives an example of KDN 2 S using Algorithm [TJ 
Suppose an end host A needs two nearest neighbors to the 
target T. Node A sends a KDN 2 S request to a service node B. 
Then B starts the KDN 2 S by forwarding a KDN 2 S query M 
to a neighbor Pi closer to T. Similarly, Pi forwards the query 
M to Pi. Now node Pi finds that it cannot find a neighbor 
closer to the target T than itself, therefore, Pi is the first 
nearest neighbor to the target. Then Pi appends its address into 
M.fl as a found nearest neighbor. Next Pi triggers the KDN 2 S 
backtracking step by forwarding M to Pi's predecessor Pi on 
the KDN 2 S forwarding path. On receiving M, Pi excludes 
Pi from the choice of candidate neighbors, and finds a new 
neighbor P3 closer to the target T than Pi. Then Pi forwards 
M to P3. P3 decides that it is the closest node to T among 
its neighbors. Therefore, P3 appends itself to M.O as a new 
nearest neighbor. Finally, P3 sends the found nearest neighbors 
in Al.il, i.e., Pi and P3, to the end host A, which completes 
the KDN 2 S. 

B. K Distributed Farthest Neighbor Search 

Similar as the KDN 2 S, K Distributed Farthest Neighbor 
Search (KDFNS) is also based on the backtracking idea. First, 
we locate one farthest neighbor and terminate at a service node 
P, then we backtrack from node P to its predecessor node on 
the forwarding path to recursively locate the rest K — 1 farthest 
neighbors. 

To locate one farthest neighbor, we recursively for- 
ward the KDN 2 S query to a service node Pi that is 
at least (1 + (3 farthest) {^farthest is 1.2 by default) times 
farther to the target T than the current service node 
P. In other words, we need to locate a node that is 
not covered by the ball Bt ((1 + (3 farthest) Apt)- Since 

B T ({I + (3 farthest) d PT ) C Bp (p (1 + (3 farthest) d PT ) by 



Algorithm 1: The pseudo-code of KDN 2 S. 

1: KDN 2 S(H, T, K, M) 

2: {Input: current node H, the target T, required number of 

closest neighbors K, query message M} 
3: {Output: nearest neighbors to T} 
4: if \M.n\ == K then 

5: Return M.Q.; {enough closest neighbors} 
6: end if 

7: S <- chooseCandidates(P, T, M); 

8: S <— S — M.Q. {remove found nearest neighbors to avoid 

search loops} 
9: x T *- InitTargetCoord(P, T); 
10: [ui, So, D T ] «- NearestDetector(P, S, x T , M); 
11: [<f)i,d ( p 1 T, Pi] TerminateTest(P, Ui,S c , Dt, M); {find one 

closest neighbor, and terminate at node Pi} 
12: M.Q «— M.Q. U {4>i}\ {cache cj>i into the query message} 
13: Select the predecessor node P2 of node Pi on the forwarding 

path M.Path; {find the predecessor for backtracking} 
14: KDN 2 S(P 2 , T, K, M); {recursive search} 



the sandwich lemma in Lemma [VI. 21 Pi needs to be at least 
p(l + Pfarthest)dp T from node P. 

Accordingly, in each search step, we try to find such 
node Pi from the concentric ring of the current service 
node P, whose delay value to P is larger or equal the 
p(l + (3 farthest) dpT- If there exists a such node Pi, then 
node Pi recursively runs the KDFNS as node P. Otherwise, 
if we can not locate such node Pi, the search is terminated, 
and the currently farthest node to the target is cached as a 
farthest neighbor to the target. Afterwards, we select the rest 
K — 1 distant neighbors by the backtracking process similar 
as that in K closest neighbor search. 

Algorithm [2] shows the complete KDFNS process. First, 
we choose candidate neighbors satisfying the delay constraint 
to the current service node P. Then we find the farthest 
neighbor to the target (FarthestDetectorQ) combining the 
delay predictions with direct probes in order to reduce the 
measurement overhead. Specifically, we choose m farthest 
neighbors from the candidate neighbors; besides, we also add 
neighbors with uncertain coordinates and erroneous predic- 
tions similar as Sec IVII-FI Next, we determine one farthest 
neighbor recursively (FarthestTerminateTest). Finally, from the 
terminating node Pi, we backtrack to the predecessor node of 
Pi on the forwarding path, and recursively run the KDFNS 
until we locate enough farthest nodes to the target. 

IX. Simulation 

In this section, we report the results of simulation experi- 
ments based on the real-world data sets in Sec IIVI 

A. Experimental Setup 

We compare HybirdNN with several DNNS algorithms. 
(l)Vivaldi. We compute the coordinate of each node based on 
the Vivaldi algorithm B31 . and find the nearest service nodes 
for each requesting node using shortest coordinate distances. 
The coordinate dimension for Vivaldi is 5. (2) CoordNN. To 
quantify the usefulness of direct probes of HybirdNN, we 
present a DNNS algorithm CoordNN, which is identical with 
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Algorithm 2: The pseudo-code of KDFNS. 

1: KDFNS(H, T, K, M) 

2: {Input: current node H, the target T, required number of 

farthest neighbors K, query message M} 
3: {Output: farthest neighbors to T} 
4: if \M.n\ == K then 
5: Return M.Q; {complete the KDFNS} 
6: end if 

7: S <— chooseFarthestCandidates(P, T, M); {choose neighbors 
whose delay values to P is larger than or equal to 

p(l + fifartheat)dpT} 

8: S S — Af.fl {remove found farthest neighbors to avoid 

search loops} 
9: x T <- InifTargetCoord(P, T); 

10: [ui,S c ,Dt] <- FarthestDetector(P, 5", aj-r, M); {select the 

farthest neighbor to T from S} 
11: [0i,d0 lT ,-Pi] «- FarthestTerminateTest(P, mA, Dt, M); 

{find one farthest neighbor, and terminate at node Pi} 
12: M.Q «— M.Q. U {0i}; {cache <f) 1 into the query message} 
13: Select the predecessor node Pi of node Pi on the forwarding 

path M .Path; {find the predecessor for backtracking} 
14: KDFNS(P 2 , T, K, M); {recursive search} 



TABLE I 

Parameter values of HybridNN for simulation. 



Parameter 


Meaning 


Value 


A 


maximal size of the ring 


8 


A + A t 


threshold of the ring size for ring updates 


10 


/3 


nearest search threshold 


1 


P 


inframetric parameter 


3 


\x\ 


coordinate dimension 


5 


K 


size of sampled neighbors for neighbor discovery 


10 


m 


number of neighbors for direct probes 


4 


T 


number of non-empty rings 


4 



HybridNN except that it uses only and no direct probes when 
determining the best next-hop neighbors. (3) DirectDN2S. 
To evaluate HybridNN, we present a DNNS algorithm Di- 
rectDN2S, which is identical with HybridNN except that it 
only utilizes direct probes for finding next-hop best neighbors 
without pruning neighbors based on coordinate distances as 
HybridNN. (4) Meridian lfl3ll . Meridian recursively forwards 
the DNNS queries to a node that is j3 times closer to the target 
than the current node, and returns the found nearest neighbor 
when no such node is selected. We configure the parameters 
of Meridian algorithm identical with the original configuration 
by Wong et al. iTHl . with the delay reduction threshold /3 as 
0.5, the upper bound on the size of each ring as 10, and the 
number of rings in the concentric ring is 20. 

For HybridNN, the default configuration is summarized in 
Table U CoordNN and DirectDN2S share identical parameters 
with HybridNN. We also evaluated the sensitivity of param- 
eters for HybridNN, which is reasonably robust against the 
parameter choices. The detailed sensitivity results of system 
parameters for HybridNN can be found in the technique report 
published online ll36l . 

We have developed a discrete-time simulator for DNNS. 
The simulator randomly chooses a set of nodes as service 
nodes (by default 500) that can receive DNNS queries. Other 
nodes in the system are clients that can issue DNNS queries 
to these service nodes. For Host479, 200 nodes are the service 



nodes. The DNNS queries are repeated 10,000 times. For each 
DNNS query, we uniformly select one client as the target 
machine, and a random service node receiving the query. 
Besides, the simulation is repeated 5 times by shuffling the set 
of service nodes to avoid biases in choosing service nodes. For 
HybridNN, CoordNN, DirectDN2S and Meridian, the inter- 
gossip events for neighborhood discovery are generated by an 
exponential distribution with expected value of 1 second. The 
inter-ring management events are generated by an exponential 
distribution with expected value of 2 seconds. For HybridNN, 
DirectDN2S and CoordNN, the time interval between two 
oversampling events of K closest neighbor search and K 
farthest neighbor search are generated by an exponential 
distribution with expected value of 60 seconds. The inter- 
DNNS event generation follows an exponential distribution 
with expected value of 60 seconds. For Vivaldi, the coordinate 
of each node is updated for 1000 rounds, by uniformly 
selecting a service node as the counterpart during each round. 

The performance metrics for each DNNS query include: (1) 
Absolute Error: defined as the absolute difference between 
the estimated nearest neighbor j and the real nearest neighbor 
i to the target T, i.e., djT — diT- (2) Relative Error: defined 
as the ratio of the absolute error for the estimated nearest 
neighbor j to the delay between the real nearest neighbor i 
and the target T, i.e., dlT d .^ lT ■ The absolute error quantifies 
the increased delay values of the estimated nearest neighbors, 
while the relative error measures the multiplicative ratios to the 
optimal delay values for the estimated neighbors. Therefore, 
large relative errors do not necessarily correspond to high 
absolute errors. (3) Search Hop: defined as the number of 
service nodes on the forwarding path minus one. Therefore, 
if node A forwards a DNNS query to node B and node B 
returns the nearest neighbor to the query host, the search hop 
for the DNNS query is one. 

B. Comparison 

Absolute Error. Fig [10] shows the absolute errors of the 
different algorithms. DirectDN2S achieves lowest absolute 
errors except for the Host479 data sets. HybridNN is close 
to DirectDN2S in terms of reducing absolute errors, however, 
HybridNN is the most accurate on Host479 data sets. Next, 
CoordNN is worse than both DirectDN2S and HybridNN. 
The accuracy of DirectDN2S and HybridNN compared to Co- 
ordNN indicates that utilizing direct probes greatly reduces the 
inaccuracy of the estimation, while using coordinate distances 
alone can lead to a bad local minima. 

The inaccuracy of DirectDN2S compared to HybridNN on 
the Host479 data set is rather counter-intuitive. The inaccuracy 
of DirectDN2S may be caused by the asymmetry in the delay 
data sets that misleads the greedy search into a local minima, 
since DirectDN2S is more accurate than HybridNN on the 
other three data sets that are all symmetric for pairwise delays. 
On the other hand, HybridNN does not always choose the 
neighbor closest to the target as the forwarding node, since 
HybridNN also incorporates the approximated delay predic- 
tions when choosing neighbors, which can help HybridNN 
bypass the bad local minimum caused by the asymmetry in 
the delay values. 
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Fig. 10. The CCDFs of absolute errors. 



Fig. 11. The CCDFs of relative errors. 



Furthermore, Meridian shows greater absolute errors com- 
pared to other algorithms including Vivaldi, which implies 
that the coordinate distances are at least effective if used it 
in the centralized approach. We are aware that the superiority 
of Vivaldi over Meridian in most cases are consistent with 
the experiments independently performed by Choffnes and 
Bustamante ll42ll . The main reasons for the less accuracy 
of Meridian are the local minima caused by the TIV and 
clustering in the delay space. On the other hand, Vivaldi can 
adapt to TIV using adaptive coordinate movements. 

Relative Error. Fig [TT] shows the relative errors of DNNS 
algorithms. The results are consistent with those of the ab- 
solute errors. DirectDN2S achieves near-zero relative errors 
for most DNNS queries on all data sets except Host479. 
HybridNN and DirectDN2S have similar accuracy, while 
HybridNN is more accurate than DirectDN2S on Host479. 
Furthermore, CoordNN is less accurate than HybridNN, while 
Meridian and Vivaldi are less accurate than DirectDN2S, 
HybridNN and CoordNN. 

Search hops. Next, we quantify the distributions of the 
number of search hops for DNNS algorithms, as shown in 
Fig [T2] Recall that the search hops are equal to the lengths of 
DNNS forwarding paths minus one. 

We can see that the search hops of most DNNS queries 
are rather modest for all DNNS algorithms. Meridian in about 
80% of the cases has 2 search hops. While HybridNN and 
DirectDN2S in over 80% of the cases have no more than 3. 

Moreover, almost all searches for Meridian, HybridNN, 
DirectDN2S are below 6 search hops. On the other hand, 
CoordNN has longer search hops than Meridian, HybridNN 
and the DirectDN2S; and a fraction of search hops even exceed 
10 on all data sets. 



C. Sensitivity of Parameters 

In this section, we evaluate the robustness of HybirdNN to 
the system size as well as the choices of system parameters. 
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1 ) System Size N: To evaluate the size of service machines 
on the performance of HybridNN, we evaluate the performance 
of HybridNN by increasing the size of service machines. We 
select target machines randomly from all nodes, including 
the clients and the service machines, as the size of clients 
shrinks when increasing the percentage of service machines. 
Fig. Qj] shows the performance of HybridNN with increasing 
the percentage of service nodes. HybridNN achieves similar 
accuracy when the size of service nodes increase compared to 
clients. Therefore, HybridNN is quite robust to the different 
scales of systems. On the other hand, the query loads of 
HybridNN increase slowly, for example, HybridNN nearly 
double the loads when the percentage of service nodes reaches 
1. 

2) Inframetric p: Fig. [14] shows the accuracy and loads 
as the increment of Inframetric parameter p. The accuracy of 
HybridNN is insensitive to choices of p. This is because for 
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most delays, its p-edge metrics are quite lower. Therefore, 
with lower p we can cover possible best next-hop neighbors 
for DNNS queries. Furthermore, although larger p increases 
the size of possible next-hop candidate neighbors, the loads of 
DNNS queries of HybridNN keep stable for different p, due 
to that we use nearly constant-sized next-hop nodes. Besides, 
we can see the standard deviations of errors are quite low for 
most data sets. 



Fig. 15. Non-Empty Threshold. 

4) Coordinate Dimension \x\: Fig. [16] illustrates the ac- 
curacy and loads when the coordinate dimension changes. 
HybridNN achieves similar accuracy and loads as the accuracy 
of coordinates keeps stably accurate as the dimension is over 
3. Therefore, HybridNN can adapt to inaccuracy of different 
dimensions of coordinates without increasing DNNS query 
loads efficiently. 
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Fig. 14. Inframetric p. 
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Fig. 16. Coordinate Dimension. 
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3) Non-Empty Threshold t: Fig.[T5lshows the accuracy and 
loads as the increment of Non-empty thresholds for pruning 
candidate neighbors for next-hop nodes. As the increment of 
non-empty thresholds for pruning candidate neighbors that 
have too few rings containing nodes, the standard deviation 
of HybridNN is reduced before the threshold reaches 4, then 
increases after the threshold is over 4, and the median errors 
are increased when the non-empty threshold exceed 8. Besides, 
the loads are reduced when the non-empty thresholds increase. 
Therefore, selecting modest-sized non-empty thresholds (e.g., 



5) Nodes Per Ring A: Fig. [T7] describes the performance 
of HybridNN with increasing upper bounds of nodes per ring. 
HybridNN achieves high accuracy event the size of one ring 
is as small as 5. This is because HybridNN selects neighbors 
from broader range [0, pd], where d is the delay from current 
node to targets. Besides, the loads of HybridNN grow slowly 
as the size of ring increases. As HybridNN utilizes coordinate 
distances to select limited number of candidate neighbor. 

6) OverSampled nearest and farthest nodes K: Fig. [18] 
illustrates the performance of HybridNN as the variation 
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Fig. 17. Nodes Per Ring. 
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Fig. 19. Returned Nodes For Next-Hop Probes. 



of oversampled number of nearest and farthest nodes K. 
HybridNN achieves similar accuracy and loads when the over- 
sampled size K of nearest neighbors and farthest neighbors. 
This is because we periodically start the oversampled process, 
which can find many nearby or far-away nodes accumulatively. 
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Fig. 18. Over-sampled number of neighbors. 
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7) Returned Nodes For Next-Hop Probe m: Fig. [19] plots 
the median errors and loads of HybridNN with increasing 
returned nodes for next-hop probes for HybridNN. For all data 
sets, HybridNN is accurate when the size of estimated nearest 
candidate neighbors for direct probes exceeds 2. Moreover, 
the loads of HybridNN increase slowly as the increment of 
relaxed probes. This is because we also add neighbors with 
higher uncertain coordinates, weakening the increased over- 
head of relaxed probes. Besides, the search process typically 
terminates at 3 to 5 hops as we found during experiments, 
therefore the measurement overhead is mostly bounded below 
3 KB. 



X. PlanetLab Experiments 

We have implemented a prototype DNNS query system 
in Java using the asynchronous communication library. We 
implemented both HybridNN and Meridian. The core DNNS 
logic consists of around 5,000 lines of codes comprising three 
main modules: (1) prober module, which uses the kernel-level 
ping for delay measurements, to allievate application level 
perturbations caused by high loads of PlanetLab nodes; (2) 
neighborhood management module, which finds and maintains 
neighbors on the concentric rings; (3) DNNS module, which 
utilizes the HybridNN or Meridian algorithm. 

Our objective is to compare the accuracy and efficiency of 
DNNS queries with related nearest server location methods 
using real-world deployments. To that end, we choose 173 
servers distributed globally on the PlanetLab as the service 
nodes. Then we select another 412 servers on the PlanetLab 
as the target machines. Our experiments last one week from 
05-05-2011 to 12-05-2011. 

We compare HybridNN with Meridian and iPlane (35). We 
choose the same parameter configurations for HybridNN and 
Meridian as in the Simulation section (Sec IDC- At . For iPlane, 
we query iPlane to obtain the delays between service nodes 
and target machines, then we compute the nearest service node 
for each target machine. 

Besides, in order to compare the found nearest servers to 
the ground-truth nearest servers, we compute the ground-truth 
nearest servers using direct probes (denoted as Direct). Specif- 
ically, since pairwise delays between PlanetLab machines keep 
varying due to routing dynamics, we first use the median delay 
of any node pairs to summarize the long-term delay trend. 
Then we select the service node that has the lowest median 
delay value to the target. 

A. Accuracy 

First we compare the accuracy of different methods with 
the absolute error metric and the relative error metric defined 
in Sec IIX-AI The results are shown in Fig l20l a) and (b). 
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HybridNN has significantly lower absolute errors and relative 
errors than Meridian. iPlane is similar with HybridNN, but 
incurs higher errors. The inaccuracy of iPlane is caused by 
the mismatch of the estimated routing paths and the real-world 
ones. The inaccuracy of Meridian shows that Meridian is easily 
trapped at local minimum far away from the optimal solutions. 

On the other hand, HybridNN and iPlane are much accurate, 
which implies that hybridNN can avoid bad local minima 
in most cases. Nevertheless, HybridNN and iPlane also have 
around 3% of DNNS queries with relative errors above 10. 
we find that HybridNN incurs such high errors occur at the 
early stage, where nodes do not have enough neighbors in their 
concentric rings. 

B. Completion Time 

Next, we evaluate the completion time of individual DNNS 
queries for HybridNN and Meridian. Empirically, we have 
found that both HybridNN and Meridian complete DNNS 
queries within three search hops, which is consistent with the 
simulation results in Fig [12] However, the overall query time 
for DNNS searches depends on not only the number of search 
hops, but also the completion time of message exchanges and 
delay probes. 

Fig l20Td) plots the distributions of query time of HybridNN 
and Meridian. Around 85% of the DNNS queries in HybridNN 
are similar with those of Meridian. Therefore, query time for 
HybridNN and Meridian are similar in most cases. However, 
around 20% of the queries take much large time to answer in 
Meridian, and 10% have query time larger than 15 seconds, 
while the hybrid measurement approach of HybridNN can 
avoid large query latencies. 

C. Query Overhead 

Next, to quantify the bandwidth overhead of the DNNS 
queries of HybridNN and Meridian, we define the load of 
a DNNS query as the total size of the transmitted packets 
during the DNNS process. We plot the CDFs of the loads for 
HybridNN and Meridian in Fig Eld). The load of HybridNN 
is significantly lower than that of Meridian. In more than 95% 
of the cases the load of HybridNN is less than 2KBytes, 
while in more than 50% of the cases the load of Meridian 
is more than 10 KBytes, which is due to the large size 
of the candidate neighbor set for DNNS queries. Therefore, 
the delay estimation of HybridNN substantially reduces the 
measurement overhead. 

D. Control Overhead 

To measure the efficiency of HybridNN and Meridian. 
We collected the bandwidth overhead of the neighborhood 
management in HybridNN and Meridian for each service node 
every two minutes, as shown in Fig l20l e). The maintenance 
overhead of Meridian includes both the gossip process and the 
ring maintenance costs, while the maintenance of HybridNN 
includes the gossip messages, K nearest neighbor search 
messages and the K farthest neighbor search messages. The 
average maintenance overhead of HybridNN is 2 KBytes per 



minute, and for Meridian is over 20 KBytes per minute. Since 
the time interval of ring maintenance for both HybridNN and 
Meridian is identical, the all-pair probes between nodes in 
the same ring is the main cause of the control overhead in 
Meridian. On the other hand, as HybridNN uses the coordinate 
distances to update the rings, it does not need to do all-pair 
probes between nodes in a ring. 

XI. Conclusion and Future Work 

We have addressed the problem of designing an accurate and 
efficient DNNS algorithm in a comprehensive way. We first 
formulate the DNNS problem to account for both symmetric 
and asymmetric delay metrics for latency optimizations. Given 
the generalized delay metrics, we proposed to use the relaxed 
inframetric for modelling the delay space as a foundation 
for designing new DNNS algorithms with strong theoretical 
guarantees concerning search overhead and accuracy of the 
search results. 

Next we apply all the insights gained to design a new DNNS 
algorithm called HybrirdNN. HybridNN locates nearest neigh- 
bors for any target using low bandwidth costs. For locating 
closer server to any target, HybridNN maximizes the diversity 
in the neighbor set, by discovering neighbors within each 
delay range through a light-weight neighbor sampling process. 
Next, in order to reduce the measurement costs of locating 
closer servers, HybridNN combines network coordinate based 
delay estimation and direct probes for fast and efficient nearest 
neighbor determination. Although the symmetric coordinate 
distances may deviate from the asymmetric delays, HybridNN 
is able to locate the nearest neighbor to the target at each 
search step, since we use direct probes to replace erroneous 
delay estimations. Finally, HybridNN terminates the search 
process conservatively in order to obtain better approxima- 
tions of nearest neighbors. We confirmed the efficiency and 
effectiveness of HybridNN with extensive simulation and a 
prototype deployment on the PlanetLab. HybridNN can locate 
approximately closest neighbors quickly with low measure- 
ment costs. 

As future work, we plan to continue two lines of research. 
First, currently we use the revised Vivaldi to estimate delays, 
which mismatches the asymmetric delay metric due to the 
symmetry of the coordinate distances. We plan to extend 
Vivaldi to asymmetric delay metrics. Second, we plan to 
study in-advance DNNS probing in order to hide the waiting 
time of on-demand DNNS queries for more practical latency- 
optimizations. 
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Appendix 



Lemma \VI.1\ Given a p-inframetric with growth 7 g > 1, 
for any x > p, r > and any node P, the volume of a ball 
Bp(r) is at most x a smaller than that of the ball Bp(xr), 
where log p 7 s < a < 21og p 7 s . 

Proof: First, according to the definition of the growth, it 
follows: 



I -Bp (xr)\ < 7 9 



B„ [ X -r 



JOURNAL OF LHeX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 



19 



Then, by recursively calling [log x] times the growth defini- 



tion, until 



Jlo gp x] 



< 1, then 



\B P (xr)\ < lg \ los ^ \B P (r)\ = x 10 ^ 1 "^ 1 \B P (r)\ 
= x a \B P (r)\,a = log a; 7 g x [log p x] 

Therefore, by the definition of the ceiling function, we can 
calculate the lower bound of a as: 

a > log a ,7 9 x log p x = log p 7 9 

On the other hand, due to x > p, j„ > 1, we get 

' logp log a; 

thus we can compute the upper bound of a as: 

a < log^g x (log p .T + 1) 
= lo 8 P 73 + log x 7 9 
< lo g P 7g + log p 7 9 
= 21og p 7 3 

this concludes the proof. ■ 
Lemma fyl. 2\ (Sandwich lemma) For any pair of node p and 
q, and d pq < r, then 

B q (r) C B p (pr) C B q (p 2 r) 



Proof: (l)For any node i satisfying d q i < r, i.e., 
i G B q (r), by the definition of the inframetric model, 

dpi < pvaax.{d pq , d q i} < ,or,thus i £ B p (pr), that is, 

B q (r) C B p (pr) 

(2) For any node j satisfying j 6 B p (pr), by the definition 
of the inframetric model, it follows 

a q j < p{d pq ,d pj } < p r 

Summing up (1) and (2) conclude the proof. ■ 
Theorem WI.3\ (Sampling efficiency in the growth dimen- 
sion) For a p-inframetric model with growth j g > 1, for a ser- 
vice node P, and a DNNS target T satisfying dpr < r, when 
selecting 3^^-^ nodes uniformly at random from Bp(pr) 
with replacement, with probability of at least 95%, one of 
these nodes will lie in Bt (fir), where log p 7 5 < a < 21og p 7 g 
and /3 < 1. 

Proof: since Bt (Pr) C Bt (r) C Bp (pr) by the 
sandwich lemma IVI.2I all nodes covered by Bt {fir) are 
also covered by Bp (pr). Therefore, we only need to sample 
enough nodes in Bp (pr) in order to sample a node located 
in B T (fir). 

Furthermore, for the pair of nodes P and T satisfying 
dpT < r, it follows 



\B P (pr)\ < \B T (p 2 



Bn 



-fir 



show the relation between the ball Bp (pr) and the ball 
B T (fir) where /3 < 1, 



\B P (HI < 



Bn 



fa 



<i4 



ft 



\B T (ftr)\ 



where log p 7 ff < a < 21og p 7 g . Therefore, the probability of 

uniformly sampling a node from Bp (pr) which lies in the 
ball B T [fir) is: 

|BrG8r)l \B T (ftr)\ 1 



\B P (pr)\ 



> (f) a \B T (ftr)\ (f)' 



Consequently, the probability that 3 ( ^- ) samples are not in 
the ball Bt (fir) is at most 



1 - 



< 



- | • 0.0.1 



Thus, with probability more than 95% we succeed in locating 
a node lying in the ball Bt (fir) with 3(^-1 samples. ■ 

Corollary A.l. For a relaxed inframetric model with growth 
7 g , according to the DNNS process in Definition \VI.4\ the 
found nearest neighbor is a 4- approximation, and the number 
of search steps is smaller than logj^A, where A is the ratio 
of the maximum delay to the minimum delay of all pairwise 
delays. 

Proof: If a DNNS request is forwarded from node P to 
node Q, the progress is said to be According to the 

DNNS search process, by Theorem IVI.3I the progress is at 
least 

reach some node v satisfying d V T < jd 
the DNNS query process as we can not find suitable next- 
hop neighbors, where is the minimum delay to target T 
Therefore, the found nearest neighbor v is i -approximation 



at every node P, therefore in at most log^ A steps, we 

which terminates 



Since we know p > 1, then ^- > p 2 > p, therefore the 
preconditions of lemma IVI.ll hold, by lemma IVI.ll we can 



